Regulating Frequency of a Migrating Web Crawler based on Users Interest

نویسندگان

Niraj Singhal

Ashutosh Dixit

R. P. Agarwal

A. K. Sharma

چکیده

Due to the lack of efficient refresh techniques, current crawlers add unnecessary traffic to the already overloaded Internet. Frequency of visits to sites can be optimized by calculating refresh time dynamically. It helps in improving the effectiveness of the crawling system by efficiently managing the revisiting frequency of a website; and appropriate chance to each type of website to be crawled at appropriate rate. In this paper we present an alternate approach for optimizing the frequency of migrants for visiting web sites based on user’s interest. The proposed architecture adjusts the frequency of revisit by dynamically assigning a priority of revisiting to a site by computing the priority based on previous experience that how many times the crawler founds changes in content in ‘n’ visits and the interest of the users shown in the websites. KeywordsSearch Engine, Migrant, Frequency regulation, Dynamic priority, User interest

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Change detection in Migrating Parallel Web Crawler: A Neural Network Based Approach

Search engines are the tools for Web site navigation and search. Search engines maintain indices for web documents and provide search facilities by continuously downloading Web pages for processing. This process of downloading web pages is known as web crawling. In this paper we propose A neural network based change detection method in migrating parallel web crawler. This method for Effective M...

متن کامل

Clustered Based User-Interest Ontology Construction for Selecting Seed URLs of Focused Crawler

With the increasing number of accessible web pages on Internet, it has become gradually difficult for users to find the web pages that are relevant to their particular needs. Knowledge about computer users is very beneficial for assisting them, predicting their future actions. Seed URLs selection for focused Web crawler intends to guide related and valuable information that meets a user’s perso...

متن کامل

Prioritize the ordering of URL queue in Focused crawler

The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...

متن کامل

DHT-Based Distributed Crawler

A search engine, like Google, is built using two pieces of infrastructure a crawler that indexes the web and a searcher that uses the index to answer user queries. While Google's crawler has worked well, there is the issue of timeliness and the lack of control given to end-users to direct the crawl according to their interests. The interface presented by such search engines is hence very limite...

متن کامل